Fix: Add circuit breaker and safe exponential retry for OpenRouter#114
Fix: Add circuit breaker and safe exponential retry for OpenRouter#114navin-oss wants to merge 3 commits intoAnkanMisra:mainfrom
Conversation
… close response bodies, context-aware backoff
📝 WalkthroughWalkthroughThe PR implements circuit breaker and retry mechanisms for OpenRouter AI service calls. A new dependency (gobreaker) is added, circuit breaker logic is initialized with failure thresholds, retry logic with exponential backoff is implemented, and main.go is updated to integrate both patterns and return appropriate HTTP status codes for different failure scenarios. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Gateway
participant CircuitBreaker
participant RetryHandler
participant OpenRouter
Client->>Gateway: POST /summarize (with request body)
Gateway->>CircuitBreaker: Check state
alt Circuit Open
CircuitBreaker-->>Gateway: ErrOpenState
Gateway-->>Client: 503 Service Unavailable
else Circuit Closed/HalfOpen
CircuitBreaker->>RetryHandler: Execute request
loop Retry Logic (up to 3 attempts)
RetryHandler->>OpenRouter: HTTP POST
alt 4xx Response
OpenRouter-->>RetryHandler: 4xx error
RetryHandler-->>CircuitBreaker: Return error
else 5xx Response
OpenRouter-->>RetryHandler: 5xx error
RetryHandler->>RetryHandler: Exponential backoff sleep
RetryHandler->>OpenRouter: Retry request
else 2xx/3xx Response
OpenRouter-->>RetryHandler: Success
RetryHandler-->>CircuitBreaker: Return response
end
end
CircuitBreaker-->>Gateway: Response or error
alt Success
Gateway-->>Client: 200 with AI summary
else Timeout
Gateway-->>Client: 504 Gateway Timeout
else Other Failure
Gateway-->>Client: 500 Internal Server Error
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| } | ||
|
|
||
| resp := result.(*http.Response) | ||
| defer resp.Body.Close() |
There was a problem hiding this comment.
Unsafe type assertion
openRouterCB.Execute returns interface{}; resp := result.(*http.Response) will panic if the callback ever returns a non-*http.Response (including a nil interface). This is a merge-blocker because it turns transient upstream failures into a gateway crash. Guard the assertion (check ok and non-nil) and return a normal error if the type is unexpected.
Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/main.go
Line: 566:569
Comment:
**Unsafe type assertion**
`openRouterCB.Execute` returns `interface{}`; `resp := result.(*http.Response)` will panic if the callback ever returns a non-`*http.Response` (including a `nil` interface). This is a merge-blocker because it turns transient upstream failures into a gateway crash. Guard the assertion (check `ok` and non-nil) and return a normal error if the type is unexpected.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
gateway/main.go (1)
80-110:⚠️ Potential issue | 🟠 MajorFix code formatting and resolve test failures before merging.
The gateway code requires formatting and has failing tests:
- Apply formatting: Run
cd gateway && go fmt ./...to format the code (formatting issues detected)- Verify linting:
go vet ./...passes- Fix failing tests:
TestCallOpenRouter_RespectsContextTimeoutTestHandleSummarize_AIRequestTimeoutReturns504Ensure all tests pass with
cd gateway && go test -v ./...before submitting.
🤖 Fix all issues with AI agents
In `@gateway/main.go`:
- Line 31: Update callOpenRouter and its caller to treat
gobreaker.ErrTooManyRequests the same as gobreaker.ErrOpenState and map both
cases to a rejection HTTP status (e.g., 503); replace direct error equality
checks with errors.Is(err, gobreaker.ErrOpenState) and errors.Is(err,
gobreaker.ErrTooManyRequests) to handle sentinel errors robustly. Specifically,
in the function callOpenRouter and the calling code that currently checks for
ErrOpenState, add an OR branch to check errors.Is(...,
gobreaker.ErrTooManyRequests) and return the same rejection handling/path
(status and message) as for ErrOpenState.
In `@gateway/openrouter_retry.go`:
- Around line 26-40: After the call to http.DefaultClient.Do(req) in
gateway/openrouter_retry.go, ensure response bodies are closed when Do returns a
non-nil resp alongside an error to avoid leaked connections; specifically, after
the call to http.DefaultClient.Do (the resp, err variables), add logic that if
err != nil && resp != nil then call resp.Body.Close() before handling/returning
the error or retrying, keeping the existing successful-path resp.Body.Close()
for 5xx cases so you don't close the same body twice.
🧹 Nitpick comments (1)
gateway/go.mod (1)
12-12: Consider upgrading gobreaker to a newer version.No known security advisories found for v1.0.0; however, this version is significantly outdated. The latest version is v2.4.0 (released January 1, 2026). While v1.0.0 is safe to use, consider upgrading to v2.4.0 for access to recent bug fixes and improvements.
|
@navin-oss might take a look at the conflicts and put your branch latest with my main branch |
Closes #101
Adds resilience to OpenRouter calls:
• Circuit breaker using sony/gobreaker
• Exponential backoff retries (3 attempts)
• Reset request body using req.GetBody() before retries
• Closed resp.Body before retry to prevent connection leaks
• Context-aware backoff (no blocking sleep)
• Avoid retrying 4xx responses
• Safe type assertions in main.go to prevent panics
Implements all requested review changes and fixes production issues identified by Greptile and CodeRabbit.
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Greptile Overview
Greptile Summary
This PR adds resilience around OpenRouter calls by introducing a global circuit breaker (sony/gobreaker) and an exponential backoff retry helper that resets request bodies and closes response bodies before retrying. The summarize handler now treats an open circuit as a 503 while keeping existing 504 timeout handling.
Key merge blockers:
callOpenRouterperforms an uncheckedinterface{}->*http.Responseassertion on the circuit breaker return value, which can panic and crash the gateway if the callback ever returns an unexpected type / nil.Confidence Score: 2/5
Important Files Changed
Sequence Diagram
sequenceDiagram autonumber participant C as Client participant G as Gateway (gin) participant CB as CircuitBreaker participant R as RetryHelper participant OR as OpenRouter API C->>G: POST /api/ai/summarize (text + payment headers) G->>G: verifyPayment() alt payment invalid G-->>C: 402/403/400 else payment valid G->>CB: Execute(doRequestWithRetry(req)) alt Circuit open CB-->>G: ErrOpenState G-->>C: 503 Service Unavailable else Circuit closed CB->>R: doRequestWithRetry(req) loop up to 3 attempts R->>R: reset req.Body via req.GetBody() R->>OR: HTTP POST /chat/completions alt 2xx/4xx OR-->>R: resp (Status<500) R-->>CB: resp else 5xx or network error OR-->>R: resp 5xx / error R->>R: close resp.Body (if present) R->>R: backoff (context-aware) end end CB-->>G: resp G->>G: decode JSON choices[0].message.content G-->>C: 200 {result} + X-402-Receipt end end(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!
Context used:
dashboard- AGENTS.md (source)